GPU Performance Test: H2O - XGBoost on GPU

1. Import the H2O Library and its XGBoost Estimator


In [1]:
import h2o
from h2o.estimators.xgboost import H2OXGBoostEstimator

2. Connect to a Running H2O instance or start a new one


In [2]:
%%capture
h2o.connect(ip="35.227.47.29")
h2o.no_progress()

3. Load Some Data From an Amazon S3 Bucket


In [3]:
# Import some data from Amazon S3
h2oDF = h2o.import_file(path="https://s3-us-west-1.amazonaws.com/dsclouddata/LendingClubData/LoansGoodBad.csv")


# Stratified Split into Train/Test
stratsplit = h2oDF["Bad_Loan"].stratified_split(test_frac=0.3, seed=12349453)
train = h2oDF[stratsplit=="train"]
test = h2oDF[stratsplit=="test"]

4. Specify the Response Column


In [4]:
# Identify predictors and response
x = train.columns
y = "Bad_Loan"
x.remove(y)

# For binary classification, response should be a factor
train[y] = train[y].asfactor()
test[y] = test[y].asfactor()
train.head(5)


RowID Loan_AmountTerm Interest_Rate Employment_YearsHome_Ownership Annual_IncomeVerification_Status Loan_Purpose State Debt_to_Income Delinquent_2yr Revolving_Cr_Util Total_AccountsBad_Loan Longest_Credit_Length
2 250060 months 15.27 0.5RENT 30000VERIFIED - income sourcecar GA 1 0 9.4 4BAD 12
3 240036 months 15.96 10 RENT 12252not verified small_businessIL 8.72 0 98.5 10GOOD 10
4 1000036 months 13.49 10 RENT 49200VERIFIED - income sourceother CA 20 0 21 37GOOD 15
5 500036 months 7.9 3 RENT 36000VERIFIED - income sourcewedding AZ 11.2 0 28.3 12GOOD 7
6 300036 months 18.64 9 RENT 48000VERIFIED - income sourcecar CA 5.35 0 87.5 4GOOD 4
Out[4]:

5. Train an XGBoost Model using GPU


In [12]:
%%time
XGB_GPU = H2OXGBoostEstimator(model_id="XGB_on_GPU", ntrees=200, max_depth=9, learn_rate=0.05, backend="gpu", gpu_id=0)
XGB_GPU.train(x=x, y=y, training_frame=train, validation_frame=test)
print "Accuracy AUC: " + str(XGB_GPU.auc())


Accuracy AUC: 0.874694485388
CPU times: user 67.8 ms, sys: 8.89 ms, total: 76.7 ms
Wall time: 11.5 s

6. Train an XGBoost Model using CPU


In [13]:
%%time
XGB_CPU = H2OXGBoostEstimator(model_id="XGB_on_CPU", ntrees=200, max_depth=9, learn_rate=0.05, backend="cpu")
XGB_CPU.train(x=x, y=y, training_frame=train, validation_frame=test)
print "Accuracy AUC: " + str(XGB_CPU.auc())


Accuracy AUC: 0.859213997196
CPU times: user 226 ms, sys: 21.3 ms, total: 247 ms
Wall time: 1min 24s

7. Which Option is Faster? Which Option Costs Less?


In [16]:
import matplotlib.pyplot as plt; plt.rcdefaults()
import numpy as np
import matplotlib.pyplot as plt
 
objects = ('GPU', 'CPU')
y_pos = np.arange(len(objects))
seconds = [11,84]
 
plt.barh(y_pos, seconds, align='center', alpha=0.5)
plt.yticks(y_pos, objects)
plt.xlabel('Seconds')
plt.title('XGBoost Training Time in Seconds')
 
plt.show()



In [15]:
GPU_Cost = (0.9 / 3600) * 11
CPU_Cost = (0.7 / 3600) * 84
objects = ('GPU Cost', 'CPU Cost')
y_pos = np.arange(len(objects))
seconds = [GPU_Cost,CPU_Cost]
 
plt.barh(y_pos, seconds, align='center', alpha=0.5)
plt.yticks(y_pos, objects)
plt.xlabel('Cents')
plt.title('Cost in Cents')
 
plt.show()